Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems
نویسندگان
چکیده
Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular, the issue of processor allocation is typically ignored. In this paper, we present a performance model for long-running parallel computations that execute with checkpointing enabled. We then discuss how it is relevant to today’s parallel computing environments and software, and present case studies of using the model to select runtime parameters.
منابع مشابه
Proposed Feature Selection for Dynamic Thermal Management in Multicore Systems
Increasing the number of cores in order to the demand of more computing power has led to increasing the processor temperature of a multi-core system. One of the main approaches for reducing temperature is the dynamic thermal management techniques. These methods divided into two classes, reactive and proactive. Proactive methods manage the processor temperature, by forecasting the temperature be...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملResilient Supplier Selection in a Supply Chain by a New Interval-Valued Fuzzy Group Decision Model Based on Possibilistic Statistical Concepts
Supplier selection is one the main concern in the context of supply chain networks by considering their global and competitive features. Resilient supplier selection as generally new idea has not been addressed properly in the literature under uncertain conditions. Therefore, in this paper, a new multi-criteria group decision-making (MCGDM) model is introduced with interval-valued fuzzy sets (I...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملAn Improved Adaptive Space-Sharing Scheduling Policy for Non-dedicated Heterogeneous Cluster Systems
Adaptive space-sharing scheduling algorithms tend to improve the performance of clusters by allocating processors to jobs based on the current system load. The focus of existing adaptive algorithms is on dedicated homogeneous and heterogeneous clusters. However commodity clusters are naturally non-dedicated and tend to be heterogeneous over the time as cluster hardware is usually upgraded and n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 61 شماره
صفحات -
تاریخ انتشار 2001